Action-Aware Embedding Enhancement for Image-Text Retrieval
نویسندگان
چکیده
Image-text retrieval plays a central role in bridging vision and language, which aims to reduce the semantic discrepancy between images texts. Most of existing works rely on refined words objects representation through data-oriented method capture word-object cooccurrence. Such approaches are prone ignore asymmetric action relation texts, that is, text has explicit (i.e., verb phrase) while image only contains implicit information. In this paper, we propose Action-aware Memory-Enhanced embedding (AME) for image-text retrieval, emphasize information when mapping texts into shared space. Specifically, integrate prediction along with an action-aware memory bank enrich features action-similar features. The effectiveness our proposed AME is verified by comprehensive experimental results two benchmark datasets.
منابع مشابه
Conditional Image-Text Embedding Networks
This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies th...
متن کاملContent Aware Image Enhancement
We present our approach, integrating imaging and vision, for content-aware enhancement and processing of digital photographs. The overall quality of images is improved by a modular procedure automatically driven by the image class and content.
متن کاملImage retrieval using the combination of text-based and content-based algorithms
Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...
متن کاملImage-Based Document Vectors for Text Retrieval
We propose a method for constructing a vector for a document image to represent its content to facilitate text retrieval. The method is based on an N-Gram algorithm for text similarity measure based on the frequency of occurrence of n-character strings appearing in the electronic text. Instead of using ASCII values, the present study investigates the use of character images to obtain the docume...
متن کاملDual-Path Convolutional Image-Text Embedding
This paper considers the task of matching images and sentences. The challenge consists in discriminatively embedding the two modalities onto a shared visual-textual space. Existing work in this field largely uses Recurrent Neural Networks (RNN) for text feature learning and employs off-the-shelf Convolutional Neural Networks (CNN) for image feature extraction. Our system, in comparison, differs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i2.20020